近年来,视觉模型(VLMS)在视觉推理任务(例如属性,位置)上表现出色。尽管这些任务衡量了必要的知识以在给定的视觉实例上进行理性和理性,但是它们却不能衡量VLMS保留和概括此类知识的能力。在这项工作中,我们评估了他们获得“可见”物理知识的能力 - 从静态场景的图像,尤其是在对象颜色,大小和空间的尺寸的范围内,可以轻松访问的信息。我们建立了自动管道,以获取综合知识资源,用于校准和探索这些模型。我们的结果表明,在所有三个任务中,模型和人类绩效之间存在严重的差距。此外,我们的字幕预测的基线(Capbert)在大小和空间任务上都大大优于VLM,这强调了,尽管有足够的视觉方式访问了地面语言,但它们仍在努力保留此类知识。数据集和代码可在https://github.com/axe--/viphy上找到。
translated by 谷歌翻译
智能交通监控(ITMO)的挑战因大量和模式的数据以及对最先进的(SOTA)推理者的利用的需求而加剧。我们提出ITMO的问题,并引入HANS,这是一种用于多模式上下文理解的神经符号结构及其在ITMO中的应用。汉斯利用知识图技术作为交通域中SOTA推理的骨干。通过案例研究,我们展示了汉斯如何解决与交通监控相关的挑战,同时能够与广泛的推理方法集成
translated by 谷歌翻译
诸如“玻璃可以用于饮用水”之类的先决条件的推理仍然是语言模型的开放问题。主要的挑战在于,前提数据的稀缺性以及模型对这种推理的缺乏支持。我们提出了粉红色的,预处理性的推论,并通过弱监督进行了改进的模型,用于通过最低限度的监督来推理前提条件。我们从经验和理论上表明,粉红色改善了基准的结果,该基准的重点是通过常识性知识的前提(高达40%的宏F1分数)进行推理。我们通过Pac-Bayesian信息分析,精确度量和消融研究进一步研究粉红色。
translated by 谷歌翻译
Finding and localizing the conceptual changes in two scenes in terms of the presence or removal of objects in two images belonging to the same scene at different times in special care applications is of great significance. This is mainly due to the fact that addition or removal of important objects for some environments can be harmful. As a result, there is a need to design a program that locates these differences using machine vision. The most important challenge of this problem is the change in lighting conditions and the presence of shadows in the scene. Therefore, the proposed methods must be resistant to these challenges. In this article, a method based on deep convolutional neural networks using transfer learning is introduced, which is trained with an intelligent data synthesis process. The results of this method are tested and presented on the dataset provided for this purpose. It is shown that the presented method is more efficient than other methods and can be used in a variety of real industrial environments.
translated by 谷歌翻译
Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. However, the self-attention mechanism, which is the core part of the Transformer model, usually suffers from quadratic computational complexity with respect to the number of tokens. Many architectures attempt to reduce model complexity by limiting the self-attention mechanism to local regions or by redesigning the tokenization process. In this paper, we propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. More specifically, we reformulate the self-attention mechanism to capture both spatial and channel relations across the whole feature dimension while staying computationally efficient. Furthermore, we redesign the skip connection path by including the cross-attention module to ensure the feature reusability and enhance the localization power. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights. The code is publicly available at https://github.com/mindflow-institue/DAEFormer.
translated by 谷歌翻译
The stock market prediction has been a traditional yet complex problem researched within diverse research areas and application domains due to its non-linear, highly volatile and complex nature. Existing surveys on stock market prediction often focus on traditional machine learning methods instead of deep learning methods. Deep learning has dominated many domains, gained much success and popularity in recent years in stock market prediction. This motivates us to provide a structured and comprehensive overview of the research on stock market prediction focusing on deep learning techniques. We present four elaborated subtasks of stock market prediction and propose a novel taxonomy to summarize the state-of-the-art models based on deep neural networks from 2011 to 2022. In addition, we also provide detailed statistics on the datasets and evaluation metrics commonly used in the stock market. Finally, we highlight some open issues and point out several future directions by sharing some new perspectives on stock market prediction.
translated by 谷歌翻译
Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences. In this paper, we use entropy to measure a model's uncertainty, i.e. how it chooses to distribute the probability mass over the set of allowed alignments. Furthermore, we evaluate the effect of entropy regularization in encouraging the model to distribute the probability mass only on a smaller subset of allowed alignments. Experiments show that entropy regularization enables a much simpler decoding method without sacrificing word error rate, and provides better time alignment quality.
translated by 谷歌翻译
Climate change has increased the intensity, frequency, and duration of extreme weather events and natural disasters across the world. While the increased data on natural disasters improves the scope of machine learning (ML) in this field, progress is relatively slow. One bottleneck is the lack of benchmark datasets that would allow ML researchers to quantify their progress against a standard metric. The objective of this short paper is to explore the state of benchmark datasets for ML tasks related to natural disasters, categorizing them according to the disaster management cycle. We compile a list of existing benchmark datasets introduced in the past five years. We propose a web platform - NADBenchmarks - where researchers can search for benchmark datasets for natural disasters, and we develop a preliminary version of such a platform using our compiled list. This paper is intended to aid researchers in finding benchmark datasets to train their ML models on, and provide general directions for topics where they can contribute new benchmark datasets.
translated by 谷歌翻译
Recent pre-trained language models have shown promising capabilities in generating fluent and realistic natural language text. However, generating multi-sentence text with global content planning has been a long-existing research question. Current approaches for controlled text generation can hardly address this issue, as they usually condition on single known control attributes. In this study, we propose a low-cost yet effective framework which explicitly models the global content plan of the generated text. Specifically, it optimizes the joint distribution of the natural language sequence and the global content plan in a plug-and-play manner. We conduct extensive experiments on the well-established Recipe1M+ benchmark. Both automatic and human evaluations verify that our model achieves the state-of-the-art performance on the task of recipe generation
translated by 谷歌翻译
We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achievements in robustness and performance are sub-optimal. Instead, we first propose preventing mode collapse to better approximate the multi-modal posterior distribution. Second, based on the intuition that a robust model should ignore perturbations and only consider the informative content of the input, we conceptualize and formulate an information gain objective to measure and force the information learned from both benign and adversarial training instances to be similar. Importantly. we prove and demonstrate that minimizing the information gain objective allows the adversarial risk to approach the conventional empirical risk. We believe our efforts provide a step toward a basis for a principled method of adversarially training BNNs. Our model demonstrate significantly improved robustness--up to 20%--compared with adversarial training and Adv-BNN under PGD attacks with 0.035 distortion on both CIFAR-10 and STL-10 datasets.
translated by 谷歌翻译